93 research outputs found
MENLI: Robust Evaluation Metrics from Natural Language Inference
Recently proposed BERT-based evaluation metrics for text generation perform
well on standard benchmarks but are vulnerable to adversarial attacks, e.g.,
relating to information correctness. We argue that this stems (in part) from
the fact that they are models of semantic similarity. In contrast, we develop
evaluation metrics based on Natural Language Inference (NLI), which we deem a
more appropriate modeling. We design a preference-based adversarial attack
framework and show that our NLI based metrics are much more robust to the
attacks than the recent BERT-based metrics. On standard benchmarks, our NLI
based metrics outperform existing summarization metrics, but perform below SOTA
MT metrics. However, when combining existing metrics with our NLI metrics, we
obtain both higher adversarial robustness (15%-30%) and higher quality metrics
as measured on standard benchmarks (+5% to 30%).Comment: TACL 2023 Camera-ready; github link fixed+Fig.3 legend fixe
Reproducibility Issues for BERT-based Evaluation Metrics
Reproducibility is of utmost concern in machine learning and natural language
processing (NLP). In the field of natural language generation (especially
machine translation), the seminal paper of Post (2018) has pointed out problems
of reproducibility of the dominant metric, BLEU, at the time of publication.
Nowadays, BERT-based evaluation metrics considerably outperform BLEU. In this
paper, we ask whether results and claims from four recent BERT-based metrics
can be reproduced. We find that reproduction of claims and results often fails
because of (i) heavy undocumented preprocessing involved in the metrics, (ii)
missing code and (iii) reporting weaker results for the baseline metrics. (iv)
In one case, the problem stems from correlating not to human scores but to a
wrong column in the csv file, inflating scores by 5 points. Motivated by the
impact of preprocessing, we then conduct a second study where we examine its
effects more closely (for one of the metrics). We find that preprocessing can
have large effects, especially for highly inflectional languages. In this case,
the effect of preprocessing may be larger than the effect of the aggregation
mechanism (e.g., greedy alignment vs. Word Mover Distance).Comment: EMNLP 2022 Camera-Ready (captions fixed
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction
The mainstream of the existing approaches for video prediction builds up
their models based on a Single-In-Single-Out (SISO) architecture, which takes
the current frame as input to predict the next frame in a recursive manner.
This way often leads to severe performance degradation when they try to
extrapolate a longer period of future, thus limiting the practical use of the
prediction model. Alternatively, a Multi-In-Multi-Out (MIMO) architecture that
outputs all the future frames at one shot naturally breaks the recursive manner
and therefore prevents error accumulation. However, only a few MIMO models for
video prediction are proposed and they only achieve inferior performance due to
the date. The real strength of the MIMO model in this area is not well noticed
and is largely under-explored. Motivated by that, we conduct a comprehensive
investigation in this paper to thoroughly exploit how far a simple MIMO
architecture can go. Surprisingly, our empirical studies reveal that a simple
MIMO model can outperform the state-of-the-art work with a large margin much
more than expected, especially in dealing with longterm error accumulation.
After exploring a number of ways and designs, we propose a new MIMO
architecture based on extending the pure Transformer with local spatio-temporal
blocks and a new multi-output decoder, namely MIMO-VP, to establish a new
standard in video prediction. We evaluate our model in four highly competitive
benchmarks (Moving MNIST, Human3.6M, Weather, KITTI). Extensive experiments
show that our model wins 1st place on all the benchmarks with remarkable
performance gains and surpasses the best SISO model in all aspects including
efficiency, quantity, and quality. We believe our model can serve as a new
baseline to facilitate the future research of video prediction tasks. The code
will be released
Recommended from our members
A ribose-functionalized NAD+ with unexpected high activity and selectivity for protein poly-ADP-ribosylation.
Nicotinamide adenine dinucleotide (NAD+)-dependent ADP-ribosylation plays important roles in physiology and pathophysiology. It has been challenging to study this key type of enzymatic post-translational modification in particular for protein poly-ADP-ribosylation (PARylation). Here we explore chemical and chemoenzymatic synthesis of NAD+ analogues with ribose functionalized by terminal alkyne and azido groups. Our results demonstrate that azido substitution at 3'-OH of nicotinamide riboside enables enzymatic synthesis of an NAD+ analogue with high efficiency and yields. Notably, the generated 3'-azido NAD+ exhibits unexpected high activity and specificity for protein PARylation catalyzed by human poly-ADP-ribose polymerase 1 (PARP1) and PARP2. And its derived poly-ADP-ribose polymers show increased resistance to human poly(ADP-ribose) glycohydrolase-mediated degradation. These unique properties lead to enhanced labeling of protein PARylation by 3'-azido NAD+ in the cellular contexts and facilitate direct visualization and labeling of mitochondrial protein PARylation. The 3'-azido NAD+ provides an important tool for studying cellular PARylation
- …